Below, we code up simple simulations corresponding to the different panels of Figure 1.
1.1 Sampling effort
We start by drawing a Directed Acyclic Graph (DAG). Note that we represent unobserved variables that were not shown, or were left implicit, in the manuscript’s figure.
DAG illustrating the assumed causal structure of Figure 1A.
Here, \(y_{[a,b]}\) represents the observed number of interactions from \(a\) to \(b\); \(m_{[a,b]}\) is the true rate of interactions from \(a\) to \(b\); \(S_{[a]}\) and \(S_{[b]}\) represent individual-level sampling efforts; \(S_{[a,b]}\) is the dyad-level sampling effort; \(\gamma_{[a]}\) is an unobserved variable affecting the tendency of an individual, \(a\) to give more or fewer interactions across partners; \(\rho_{[a]}\) is an unobserved variable affecting the tendency of an individual, \(a\), to receive more or fewer interactions across partners;
and \(\tau\) is an unobserved variable affecting the tendency of an individual, \(a\), to give more or fewer interactions to a specific partner, \(b\).
We then write a generative model corresponding to the DAG:
In the following subsections, we follow the same workflow (DAG \(\to\) generative model \(\to\) plot synthetic data) for different causal structures.
1.2 Individual-level causal effects
Directed Acyclic Graph:
DAG illustrating the assumed causal structure of Figure 1B.\(X_{[a]}\) and \(X_{[b]}\) are individual-level phenotypic features (e.g., age, colouration).
Then, the opposite: we show the number of observed interactions \(y_{[a, b]}\) for five actors \(a\) (x-axis) and five recipients \(b\) (one colour per recipient \(b\)).
# We run the functionset.seed(666) df2.3<-scm.2(N_ind =5,sigma_rho =0.5,beta_g =0.2 )# Reorganise data df2.3%>%select(ind_a, ind_b, y_ab) %>%rename(Actor = ind_a,recipient = ind_b,y = y_ab) %>%bind_rows( df2.3%>%select(ind_a, ind_b, y_ba) %>%rename(Actor = ind_b,recipient = ind_a,y = y_ba)) %>%# Plotggplot(aes(x = Actor, y = y, group =as.factor(recipient))) +geom_line(aes(color =as.factor(recipient)), linewidth =1.25) +geom_point(aes(fill =as.factor(recipient)),color ="#f0f2f2",size =4,shape =21,stroke =1) +theme(legend.position ="none",panel.grid.major.x =element_line(color ="#ebe9df", linewidth =0.3)) +scale_color_manual(values =c("#BF402B", "#FECD64", "#C5DAEF", "#2071B3", "#389E55")) +scale_fill_manual(values =c("#BF402B", "#FECD64", "#C5DAEF", "#2071B3", "#389E55"))
1.3 Dyad-level causal effects
Directed Acyclic Graph:
DAG illustrating the assumed causal structure of Figure 1C.\(R_{[a]}\) and \(R_{[b]}\) are the individual-level phenotypic features (e.g., dominance rank), and \(\Delta R_{|a, b|}\) is a dyad-level feature (e.g., difference in rank).
DAG illustrating the assumed causal structure of Figure 1D.\(y_{[a,b]}(t)\) represents the number of observed interactions from \(a\) to \(b\) at time \(t\), and \(y_{[a,b]}(t+1)\) the number of observed interactions from \(a\) to \(b\) at time \(t+1\).
scm.4<-function(N_ind =20, # Nb of individualsN_dyad = ((N_ind * N_ind) - N_ind) /2, # Nb of dyadsdelta =1, # Minimal and initial interaction ratenb_loops =500){# Empty data frame with name and dyad id basic_df <-tibble(ind_a =t(combn(N_ind, 2))[, 1],ind_b =t(combn(N_ind, 2))[, 2],dyad =c(1:N_dyad) )# Data frame at step t = 1 list <-list() list[[1]] <- basic_df %>%mutate(m_ab =rpois(N_dyad, delta),m_ba =rpois(N_dyad, delta),y_ab =rpois(N_dyad, m_ab),y_ba =rpois(N_dyad, m_ba),y_ab_z = (y_ab -mean(c(y_ab, y_ba))) /sd(c(y_ab, y_ba)),y_ba_z = (y_ba -mean(c(y_ab, y_ba))) /sd(c(y_ab, y_ba)),t =1 )# Data frame at steps t > 1for (i in2:nb_loops) { list[[i]] <- basic_df %>%mutate(m_ab = list[[i -1]]$m_ab +ifelse(list[[i -1]]$y_ba >4& list[[i -1]]$m_ab <10, 1, -1) +rnorm(N_dyad, 0, 1.5),m_ab =ifelse(m_ab < delta, delta, m_ab),m_ba = list[[i -1]]$m_ba +ifelse(list[[i -1]]$y_ab >4& list[[i -1]]$m_ba <10, 1, -1) +rnorm(N_dyad, 0, 1.5),m_ba =ifelse(m_ba < delta, delta, m_ba),y_ab =rpois(N_dyad, m_ab),y_ba =rpois(N_dyad, m_ba),y_ab_z = (y_ab -median(c(y_ab, y_ba))) /sd(c(y_ab, y_ba)),y_ba_z = (y_ba -median(c(y_ab, y_ba))) /sd(c(y_ab, y_ba)),t = i ) } # i list %>%bind_rows() %>%return() } # fct
We run the function, and show how the number of interactions in one direction \(y_{[a, b]}\) (blue) and the opposite direction \(y_{[b, a]}\) (yellow), on the y-axis, change as a function of time (x-axis) for 16 dyads (one per panel).
# Run the functionset.seed(2666) df4 <-scm.4(nb_loops =200)# Plot df4 %>%filter(dyad <17) %>%gather(direction, y, m_ab:m_ba) %>%ggplot(aes(x = t, y = y, group = direction, color =as.factor(direction))) +geom_line(alpha =0.7) +facet_wrap(~ dyad) +scale_color_manual(values =c("#d3c94e", "#3d6a77")) +theme(legend.position ="none",strip.background =element_blank(), strip.text =element_blank()) +labs(x ="Time", y ="")
We also plot \(y_{[a, b]}\) against \(y_{[b, a]}\) in a cross-sectional sample.
Directed Acyclic Graph (note that we only showed a subset of it in the manuscript):
DAG illustrating the assumed causal structure of Figure 1E. For all triads \((a, b, c)\) in the population understudy, \(y_{|a, b|}\) represents the number of (undirected) observed interactions between \(a\) and \(b\), \(y_{|a, c|}\) the number of observed interactions between \(a\) and \(c\), and \(y_{|b, c|}\) the number of observed interactions between \(b\) and \(c\). Like in the previous subsection, these variables are indexed by time \(t\).
scm.5<-function(N_ind =12, # Nb of individualsN_dyad = ((N_ind * N_ind) - N_ind) /2, # Nb of dyadsN_triad =t(combn(N_ind, 3))[, 1] %>%length(), #Nb triadsp_initial =0.2, # Inital proba of forming a tie yp_sw_1_0 = (1/200), # switch from 1 to 0 (loss of tie)p_sw_0_1 = (1/2000), # switch from 0 to 1 (new random tie)n_l_i =100, # Number of iterations i (time)n_l_j =15){ # Number of considered triads undergoing triadic closure ### Create objects# Empty data frame with name and dyad id basic_df <-tibble(ind_a =t(combn(N_ind, 2))[, 1],ind_b =t(combn(N_ind, 2))[, 2],dyad =paste0(ind_a, "_", ind_b) ) list <-list() # list containing data frames at different t list[[1]] <-list() # list containing data frames at different t list[[2]] <-list() # list containing data frames at different t d <- list### Iteration 1## Step 1# Edge generated with fixed probability (list[[1]][[1]] <- basic_df %>%mutate(y_ab =rbinom(N_dyad, 1, p_initial),t =1 ))## STEP 2# No triangle closure during the first iteration list[[2]][[1]] <- list[[1]][[1]]### Iterations ifor (i in2:n_l_i){## STEP 1# Random creation and losses of tie list[[1]][[i]] <- list[[2]][[i-1]] %>%mutate(rs =runif(n(), 0, 1),y_ab =case_when( rs < p_sw_1_0 & y_ab ==1~0, rs >= p_sw_1_0 & y_ab ==1~1, rs < p_sw_0_1 & y_ab ==0~1, rs >= p_sw_0_1 & y_ab ==0~0),t = i ) %>%select(-rs)## STEP 2# Note that there is a non-null prob. that the same triad# is selected twice in the same iteration, in which case# nothing happens: the triad is already closedfor (j in1:n_l_j){# Randomly select one dyad a-b (dyad_ab <- list[[1]][[i]] %>%filter(y_ab ==1) %>%slice_sample(n =1))# Save the a and b's ids (dyad_ab_ids <- dyad_ab %>%select(ind_a, ind_b) %>%unlist())# Sample one individual c among all their potential partners (id_c <-sample(c(1:N_ind)[-c(dyad_ab_ids)], 1))# Select dyad a-c (dyad_ac <- list[[1]][[i]] %>%filter(dyad ==paste0(id_c, "_", dyad_ab_ids[1]) | dyad ==paste0(dyad_ab_ids[1], "_", id_c)))# Select dyad b-c (dyad_bc <- list[[1]][[i]] %>%filter(dyad ==paste0(id_c, "_", dyad_ab_ids[2]) | dyad ==paste0(dyad_ab_ids[2], "_", id_c)))## Data frame at iteration i, step 2:# If two out of the three edges exist, close the triangleif (dyad_ab$y_ab + dyad_ac$y_ab + dyad_bc$y_ab ==2) { list[[2]][[i]] <- list[[1]][[i]] %>%mutate(y_ab =ifelse(dyad == dyad_ab$dyad | dyad == dyad_ac$dyad | dyad == dyad_bc$dyad, 1, y_ab))# Otherwise, no change } else {list[[2]][[i]] <- list[[1]][[i]]} } # j# Reformat data (d[[i]] <- list[[2]][[i]] %>%select(-c(t, dyad)) %>%rename(y = y_ab, ind_i = ind_a, ind_j = ind_b))# Assign each triad... d[[i]] <-tibble(ind_a =t(combn(N_ind, 3))[, 1],ind_b =t(combn(N_ind, 3))[, 2],ind_c =t(combn(N_ind, 3))[, 3],# a tried IDtriad =c(1:N_triad) ) %>%left_join(d[[i]], by =c("ind_a"="ind_i", "ind_b"="ind_j")) %>%rename(y_ab = y) %>%left_join(d[[i]], by =c("ind_b"="ind_i", "ind_c"="ind_j")) %>%rename(y_bc = y) %>%left_join(d[[i]], by =c("ind_a"="ind_i", "ind_c"="ind_j")) %>%rename(y_ac = y) } output <-list(list = list,d = d )return(output)} # scm.5
library(animation)saveGIF({for (i in1:100) {# Filter data to include only ties with y_ab == 1 ties <-subset(df5$list[[2]][[i]], y_ab ==1)# Create a graph object graph <-tbl_graph(edges = ties, directed =FALSE) p <-ggraph(graph, 'linear', circular =TRUE) +geom_edge_link(color ="#3A5B71") +geom_node_point(fill ='#3D485B',shape =21,color ="#f0f2f2",size =5 ) +labs(title =paste0("Time step ", i)) +theme_void() print(p) }}, interval = .2, movie.name =".gif",ani.width =1500, ani.height =1500, ani.res =600)
We then look at the association created by the causal process. Specifically, we plot the association between \(y_{|a,b|}(t) + y_{|a,c|}(t)\) and \(y_{|b,c|}(t)\).
In this section, we present a generative model, generate synthetic data from it, and compare the estimates from two different statistical models. In doing so, we show that failing to block the backdoor path on the DAG leads to a biased estimate for the causal effect of interest.
Directed Acyclic Graph:
DAG illustrating the assumed causal structure of Figure 2.\(X_{[a]}\) and \(X_{[b]}\) are individual-level phenotypic features (e.g., colouration); \(S_{[a]}\) and \(S_{[b]}\) represent individual-level sampling effort; \(S_{[a,b]}\) is the dyad-level sampling effort.
And plot the marginal posterior distribution of \(\beta_{\gamma}\), for the two statistical models (adjusted by \(S_{[a, b]}\) at the bottom, and non-adjusted at the top):
We observe that the model that conditions on \(S_{|a,b|}\) (adj, bottom) recovers the true effect (in blue), whereas the model that does not condition on \(S_{|a,b|}\) (unadj, bottom) produces a biased estimate.